The dataset contains quarterly tourist arrivals from the United States between 1981 and 2012, however we only.
It includes one time series variable
(US arrivals), measured four times per year
(quarterly).
The data were originally obtained from the fpp2 package and are expressed in thousands of visitors.
library(readr)
library(fpp2)
## Registered S3 method overwritten by 'quantmod':
## method from
## as.zoo.data.frame zoo
## ── Attaching packages ────────────────────────────────────────────── fpp2 2.5 ──
## ✔ ggplot2 4.0.0 ✔ fma 2.5
## ✔ forecast 8.24.0 ✔ expsmooth 2.3
##
library(TTR)
arrivals_data_csv <- read_csv("arrivals_quarterly.csv")
## Rows: 127 Columns: 5
## ── Column specification ────────────────────────────────────────────────────────
## Delimiter: ","
## chr (1): Quarter
## dbl (4): Japan, NZ, UK, US
##
## ℹ Use `spec()` to retrieve the full column specification for this data.
## ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
head(arrivals_data_csv)
## # A tibble: 6 × 5
## Quarter Japan NZ UK US
## <chr> <dbl> <dbl> <dbl> <dbl>
## 1 1981 Q1 14.8 49.1 45.3 32.3
## 2 1981 Q2 9.32 87.5 19.9 23.7
## 3 1981 Q3 10.2 85.8 24.8 24.5
## 4 1981 Q4 19.5 61.9 52.3 33.4
## 5 1982 Q1 17.1 42.0 53.6 33.5
## 6 1982 Q2 10.6 63.1 34.8 28.4
spec(arrivals_data_csv)
## cols(
## Quarter = col_character(),
## Japan = col_double(),
## NZ = col_double(),
## UK = col_double(),
## US = col_double()
## )
arrivals_ts_st <- ts(arrivals_data_csv$US, start=c(1981,1), frequency=4)
# drop the data of 2012 since it only has 3 quarter
arrivals_ts <- window(arrivals_ts_st, end = c(2011, 4))
head(arrivals_ts)
## Qtr1 Qtr2 Qtr3 Qtr4
## 1981 32.316 23.721 24.533 33.438
## 1982 33.527 28.366
tail(arrivals_ts)
## Qtr1 Qtr2 Qtr3 Qtr4
## 2010 111.615 127.002
## 2011 125.264 101.814 101.925 127.150
plot(arrivals_ts)
The time series of U.S. tourist arrivals (1981–2012) shows several key characteristics:
Upward trend: There is a clear long-term
increase in tourist arrivals, particularly from the early 1980s through
the late 1990s.
Strong seasonality: Regular quarterly peaks and troughs indicate recurring seasonal patterns, likely associated with holiday or vacation cycles.
Increasing variability: The fluctuations become larger over time, suggesting more volatile tourism demand in later years.
Recent flattening: After around 2000, the growth trend appears to slow down or level off, possibly due to economic factors or global events affecting travel.
Overall, the series demonstrates both trend and seasonality
acf(arrivals_ts)
The Autocorrelation Function (ACF) plot for the U.S. tourist arrivals time series shows the following characteristics:
Overall, the ACF plot supports the earlier findings that the data exhibits both trend and seasonality
# Display summary statistics
summary(arrivals_ts)
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 23.72 63.47 85.43 84.15 108.53 136.09
# Create a box plot for visualizing distribution
boxplot(arrivals_ts,
main = "Boxplot of U.S. Tourist Arrivals (1981–2012)",
ylab = "Number of Arrivals (in thousands)",
col = "lightblue",
border = "darkblue")
# box plot for visulizing distribution by year
years <- floor(time(arrivals_ts))
values <- as.numeric(arrivals_ts)
df <- data.frame(year = years, value = values)
boxplot(value ~ year, data = df,
main = "Boxplots by Year",
xlab = "Year", ylab = "Arrivals (thousands)",
col = "lightblue", border = "gray40", outline = TRUE, las = 2, cex.axis = 0.7)
The summary statistics for the U.S. tourist arrivals (1981–2012) are as follows:
| Statistic | Value |
|---|---|
| Minimum | 23.72 |
| 1st Quartile (Q1) | 63.95 |
| Median | 85.88 |
| Mean | 84.85 |
| 3rd Quartile (Q3) | 108.98 |
| Maximum | 136.09 |
Interpretation:
# Model output
naive_forecast <- naive(arrivals_ts,8)
plot(naive_forecast)
naive_residual = residuals(naive_forecast)
plot(naive_residual, main="Residuals from Naïve Forecast", ylab="Residuals", xlab = "Year")
abline(h=0, col="red")
The residuals are centered near zero (no major bias). These patterns indicate that the Naïve method fails to fully account for the seasonal and trend components in the data.
hist(naive_residual, main="Histogram of Naïve Forecast Residuals",
xlab="Residuals", col="lightblue", border="white")
The residuals are centered near zero, but the distribution isn’t symmetric.
That indicats the Naïve model doesn’t fit the data perfectly — some structure like seasonality is still left in the residuals.
plot(fitted(naive_forecast), naive_residual,
main="Fitted Values vs Residuals (Naïve Model)",
xlab="Fitted Values", ylab="Residuals")
abline(h=0, col="red")
There is no functional relationship between the fitted values and the residuals. The plot points appear clustered or randomly scattered (as seen in your screenshot), because the time order no longer represents any meaningful model fitting logic. As a result, this plot is not as informative or meaningful.
plot(arrivals_ts, naive_residual,
main = "Actual Values vs Residuals (Naïve Model)",
xlab = "Actual Values", ylab = "Residuals",
type = "p", pch = 19, col = "darkblue", cex= 0.8)
abline(h = 0, col = "red")
Acf(naive_residual, main="ACF of Residuals - Naïve Forecast")
The ACF plot of residuals shows significant autocorrelation at seasonal lags (e.g., 4, 8, 12), indicating that the Naïve model has not fully captured the underlying seasonal pattern in the time series. Therefore, the residuals are not white noise.
accuracy(naive_forecast)
## ME RMSE MAE MPE MAPE MASE
## Training set 0.7710081 12.47908 9.706016 -0.08056945 11.91185 1.301303
## ACF1
## Training set -0.1796656
naive_forecast$mean
## Qtr1 Qtr2 Qtr3 Qtr4
## 2012 127.15 127.15 127.15 127.15
## 2013 127.15 127.15 127.15 127.15
plot(naive_forecast$mean)
autoplot(naive_forecast) +
ggtitle("Naïve Forecast for Next 8 Quarters") +
ylab("Tourist Arrivals (Thousands)") +
xlab("Year") +
theme_minimal()
The Naïve model’s accuracy is moderate. The RMSE of 12.48 and MAPE around 11.9% suggest that the model captures general stability in the series but cannot adjust for trend or seasonality.
The Naïve forecast provides a simple benchmark model with easy interpretation but limited predictive capability. It serves as a baseline for comparison against more sophisticated models such as Holt’s Trend or Holt-Winters.
# Compute moving averages of different orders
MA3_forecast <- ma(arrivals_ts, order = 3)
MA6_forecast <- ma(arrivals_ts, order = 6)
MA9_forecast <- ma(arrivals_ts, order = 9)
# Plot original time series
plot(arrivals_ts, main = "Simple Moving Averages (Orders 3, 6, 9)",
ylab = "Tourist Arrivals (Thousands)", xlab = "Year", col = "black")
# Add the three moving averages
lines(MA3_forecast, col = "red", lwd = 2)
lines(MA6_forecast, col = "blue", lwd = 2)
lines(MA9_forecast, col = "green", lwd = 2)
# Forecast next 4 quarters using order 6 (bonus)
fit_ma6 <- auto.arima(na.omit(MA6_forecast))
forecast_ma6 <- forecast(fit_ma6, h = 4)
lines(forecast_ma6$mean, col = "purple", lwd = 2)
legend("topleft",
legend = c("Original", "MA(3)", "MA(6)", "MA(9)", "Forecast MA(6)"),
col = c("black", "red", "blue", "green", "purple"),
lwd = 2, bty = "n")
The chart shows the original time series along with three moving averages of different orders (3, 6, and 9).
Overall, increasing the order of the moving average reduces short-term noise but delays response to sudden changes in the data.
ss_forecast = ses(arrivals_ts, h = 4)
plot(ss_forecast)
ss_forecast$model
## Simple exponential smoothing
##
## Call:
## ses(y = arrivals_ts, h = 4)
##
## Smoothing parameters:
## alpha = 0.3625
##
## Initial states:
## l = 29.2613
##
## sigma: 11.0571
##
## AIC AICc BIC
## 1197.661 1197.861 1206.122
Overall, the SES model with α = 0.36 provides a moderately smoothed forecast, effectively capturing the level of the series without modeling trend or seasonality. The residual variance (σ ≈ 11) shows reasonable stability, making SES suitable for short-term level forecasting.
ss_residual = residuals(ss_forecast)
plot(ss_residual, main="Residuals from Simple Smoothing Forecast", ylab="Residuals", xlab = "Year")
abline(h=0, col="red")
hist(ss_residual, main = "Histogram of SS model Residual", xlab="Residuals", col="lightblue", border="white")
plot(fitted(ss_forecast), ss_residual,
main="Fitted Values vs Residuals (SS Model)",
xlab="Fitted Values", ylab="Residuals")
abline(h=0, col="red")
plot(arrivals_ts, ss_residual,
main = "Actual Values vs Residuals (SS Model)",
xlab = "Actual Values", ylab = "Residuals",
type = "p", pch = 19, col = "darkblue", cex= 0.8)
abline(h = 0, col = "red")
acf(ss_residual, main="ACF of Residuals - SS Forecast")
accuracy(ss_forecast)
## ME RMSE MAE MPE MAPE MASE ACF1
## Training set 1.933619 10.96758 8.848115 1.518322 10.87885 1.186282 0.06230072
ss_forecast$mean
## Qtr1 Qtr2 Qtr3 Qtr4
## 2012 116.1663 116.1663 116.1663 116.1663
plot(ss_forecast$mean)
hw_forecast = hw(arrivals_ts)
forecast_hw = forecast(hw_forecast, h = 4)
forecast_hw$model
## Holt-Winters' additive method
##
## Call:
## hw(y = arrivals_ts)
##
## Smoothing parameters:
## alpha = 0.4975
## beta = 1e-04
## gamma = 0.0921
##
## Initial states:
## l = 27.0003
## b = 0.7211
## s = 6.6781 -4.2792 -6.5704 4.1715
##
## sigma: 7.8361
##
## AIC AICc BIC
## 1118.013 1119.592 1143.395
hw_residual = residuals(forecast_hw)
#plot residual
plot(hw_residual, main = "Residuals from Holt Winter Model", ylab="Residuals", xlab = "Year")
#histogram
hist(hw_residual, main = "Histogram of Holt Winter model Residual", xlab="Residuals", col="lightblue", border="white")
# fitted value vs residuals
plot(fitted(forecast_hw), hw_residual,
main="Fitted Values vs Residuals (Holt Winter Model)",
xlab="Fitted Values", ylab="Residuals")
abline(h=0, col="red")
# real values vs residuals
plot(arrivals_ts, hw_residual,
main = "Actual Values vs Residuals (Holt Winter Model)",
xlab = "Actual Values", ylab = "Residuals",
type = "p", pch = 19, col = "darkblue", cex= 0.8)
abline(h = 0, col = "red")
#ACF plot of residuals
acf(hw_residual, main="ACF of Residuals - Holt Winter Forecast")
The residuals fluctuate randomly around zero, without any visible
pattern or systematic trend.
This suggests that the model has successfully captured the main
structure of the data (trend and seasonality).
No obvious autocorrelation or heteroscedasticity is observed, indicating
that the residuals behave like white noise.
The histogram of residuals appears approximately symmetric and
centered near zero, forming a bell-shaped curve.
This supports the assumption of normally distributed residuals and
suggests that the Holt–Winters model provides unbiased forecasts.
A few minor deviations at the tails indicate small outliers, but they
are not severe.
The residuals are scattered evenly around the horizontal zero line,
showing no clear pattern or curvature.
This confirms that there is no functional relationship between the
fitted values and residuals, and the variance remains fairly
constant.
In other words, the model does not systematically over- or under-predict
at any particular fitted value range.
Similar to the previous plot, residuals do not show any visible trend
against the actual observed values.
This reinforces that the model errors are randomly distributed and not
dependent on the magnitude of the actual data,
indicating a good model fit without bias across different levels of
tourist arrivals.
The ACF plot of residuals shows that most autocorrelation values fall
within the 95% confidence bands.
This implies that residuals are uncorrelated over time,
meeting the assumption of independence.
It also suggests that the Holt–Winters model has adequately captured the
serial dependence in the original time series.
The residual diagnostics confirm that the Holt–Winters additive model
provides a good fit for the data.
Residuals resemble white noise — centered around zero, normally
distributed, and uncorrelated —
indicating that the model effectively explains both trend and
seasonality in the tourist arrivals series.
accuracy(forecast_hw)
## ME RMSE MAE MPE MAPE MASE
## Training set -0.003138538 7.5791 5.505299 -0.3492589 7.191839 0.7381052
## ACF1
## Training set 0.0132075
forecast_hw$mean
## Qtr1 Qtr2 Qtr3 Qtr4
## 2012 125.2931 108.0475 113.8382 126.0028
plot(forecast_hw$mean)
The Holt–Winters additive model achieved strong accuracy metrics: -
ME = –0.083, RMSE = 7.58, MAE
= 5.51, MAPE = 7.18%, MASE =
0.74, ACF1 = 0.01.
These results indicate that forecast errors are small and
unbiased.
The low RMSE and MAPE show the model performs
substantially better than simpler methods such as Naïve or SES.
The near-zero ACF1 value confirms that residuals are
nearly uncorrelated, which means the model has captured most of the
systematic patterns.
The model predicts tourist arrivals (in thousands) for 2012 as follows: | Quarter | Forecasted Value | |———-|——————| | Q1 | 125.29 | | Q2 | 108.05 | | Q3 | 113.84 | | Q4 | 126.00 |
Overall, the model expects a temporary dip in mid-year (Q2–Q3)
followed by a recovery in Q4 —
a pattern consistent with the seasonal nature of U.S. tourist
arrivals.
Conclusion:
The Holt–Winters additive method delivers accurate, stable forecasts and
effectively models both trend and seasonal variations,
making it one of the most suitable approaches for this quarterly tourism
dataset.
# Create accuracy comparison table
model_comparison <- rbind(
naive = accuracy(naive_forecast),
ss = accuracy(ss_forecast),
ma = accuracy(forecast_ma6),
hw = accuracy(forecast_hw)
)
rownames(model_comparison) <- c( "Naive", "SES", "Moving Average","Holt Winter")
round(model_comparison[, c("RMSE","MAE","MAPE","MASE")], 3)
## RMSE MAE MAPE MASE
## Naive 12.479 9.706 11.912 1.301
## SES 10.968 8.848 10.879 1.186
## Moving Average 0.966 0.736 0.922 0.136
## Holt Winter 7.579 5.505 7.192 0.738
| Accuracy Measure | Best Model | Worst Model | Interpretation |
|---|---|---|---|
| RMSE | Holt–Winters | Naïve | Holt–Winters has the lowest forecast error magnitude. |
| MAE | Holt–Winters | Naïve | Smallest average deviation from actual values. |
| MAPE | Holt–Winters | Naïve | Most accurate in percentage terms. |
| MASE | Moving Average | Naïve | Moving Average performs best on a scaled basis, likely due to strong smoothing of short-term noise. |
The Holt–Winters model provides the most
accurate forecasts across nearly all error metrics, effectively
capturing both trend and seasonal variation.
The Naïve model performs the worst and serves mainly as
a baseline.
The Moving Average model performs well in terms of MASE
but tends to oversmooth and lag behind rapid changes.
Overall, the Holt–Winters additive model is the most
suitable forecasting technique for this time series,
providing accurate and interpretable predictions that align well with
observed seasonal tourism behavior.
Future values are projected to remain stable with a slight upward trend,
reflecting the maturity and cyclical nature of U.S. tourism over
time.